Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FIX] OWColor: Fix propagating changes to the output #2379

Merged
merged 2 commits into from
Jul 10, 2017

Conversation

nikicc
Copy link
Contributor

@nikicc nikicc commented Jun 5, 2017

Issue

Fixes #2378

Description of changes

Assure that when domains are equal, from_tables sets the resulting table's domain to the instance that was passed on the input.

Includes
  • Code changes
  • Tests
  • Documentation

@nikicc nikicc requested a review from janezd June 5, 2017 15:19
@codecov-io
Copy link

codecov-io commented Jun 5, 2017

Codecov Report

Merging #2379 into master will decrease coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #2379      +/-   ##
==========================================
- Coverage   74.33%   74.32%   -0.01%     
==========================================
  Files         321      321              
  Lines       55988    55990       +2     
==========================================
  Hits        41616    41616              
- Misses      14372    14374       +2

@@ -350,6 +350,7 @@ def set_data(self, data):
self._create_proxies(domain.metas))
self.openContext(data)
self.data = data.transform(domain)
self.data.domain = domain
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we need a comment here, at the very least.

But there may be a more general problem here. One would expect that after self.data = data.transform(domain), we would have self.data.domain is domain, while transform only ensures self.data.domain == domain. We should either change transform to (always) compare domains with is, not ==, or we can add a flag to specify how to compare domains.

@lanzagar, @astaric?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would change transform to compare with is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CMIIW, but changing Table.from_table to compare domains with is instead of == doesn't look like a trivial change. Also it by itself doesn't make OWColor work — the problem then arises in DomainConversion, which doesn't seem to know how to do the transformation. As a result, we get a data table with all values missing.

Before, as we checked with == if domains matched we called from_table_rows which just makes a selection of rows and keeps the provided domain. Now, if domain1 == domain2 but domain1 is not domain2 we need to make a domain conversion.

It seems like the core of the problem with all values missing is in how DomainConversion checks whether some variable is inside a domain. The relevant part of the code from domain.py:

    def __contains__(self, item):
        """
        Return `True` if the item (`str`, `int`, :class:`Variable`) is
        in the domain.
        """
        return item in self._indices

It seems like item in self._indices == False while item in list(self._indices.keys()) == True. Is this a result of how we compare variables?

Should we maybe just revert to self.data.domain = domain as it was before @janezd refactoring? If so, @janezd can you provide an explanation, since I'm not quite sure how variable's proxies work?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first problem sounds like a broken hash on Variable.

As for the fix, I would rather move the data.domain = domain to transformmethod than have every caller do this explicitly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@astaric but shouldn't then we fix Table.from_table as well to prevent from_table and transform to behave differently? Isn't data.transform currently just a syntactic sugar?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is also possible. As we have changed all occurrences of Table.from_table with data.transform, I kind of consider it deprecated, but yeah, we can put the change there.

Copy link
Contributor

@janezd janezd Jun 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... and risk @janezd changing it back to data.transform the next time he sees this code? If data.transform is syntactic sugar it has to have the same behaviour as the thing that this sugar is coating.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... and risk @janezd changing it back to data.transform the next time he sees this code?

@janezd change to data.transform didn't break anything (transform and to_table behave the same way). Removing of the extra self.data.domain = domain is the change that broke things.

If data.transform is syntactic sugar it has to have the same behaviour as the thing that this sugar is coating.

Agree. Hence, if we are going to fix this in the core we need to do it in from_table. And while I'm all for changing to_table to compare domains with is instead of ==, I'm afraid this might cause other issues all over the place. As it has revealed some problems in domain's __contains__ which I'm not sure how to solve. Any help appreciated 🙈

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And if anyone wishes to play around, here is the minimal example of the problem, after the change introduced in this PR:

import Orange
from Orange.data import Table


def _create_proxies(variables):
    part_vars = []
    for var in variables:
        if var.is_discrete or var.is_continuous:
            var = var.make_proxy()
            if var.is_discrete:
                var.values = var.values[:]
        part_vars.append(var)
    return part_vars


data = Table('iris')
domain = data.domain
domain = Orange.data.Domain(_create_proxies(domain.attributes),
                            _create_proxies(domain.class_vars),
                            _create_proxies(domain.metas))

new_data = data.transform(domain)
print(new_data)

The new_data contains all missing values since variables from the destination domain are not in the source domain and compute values are not set.

@nikicc nikicc changed the title [FIX] OWColor: Fix propagating changes to the output [WIP] [FIX] OWColor: Fix propagating changes to the output Jun 13, 2017
This ensures that after one calls data.transform(domain) the domain of the resulting data is set to the instance that was passed to transform.
@nikicc nikicc changed the title [WIP] [FIX] OWColor: Fix propagating changes to the output [FIX] OWColor: Fix propagating changes to the output Jul 7, 2017
@nikicc
Copy link
Contributor Author

nikicc commented Jul 7, 2017

@astaric this should work now, please check.

@astaric astaric merged commit 84762b4 into biolab:master Jul 10, 2017
@nikicc nikicc deleted the owcolor-fixup branch July 10, 2017 09:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants